Automated Metrics That Agree With Human Judgements On Generated Output for an Embodied Conversational Agent

نویسنده

  • Mary Ellen Foster
چکیده

When evaluating a generation system, if a corpus of target outputs is available, a common and simple strategy is to compare the system output against the corpus contents. However, cross-validation metrics that test whether the system makes exactly the same choices as the corpus on each item have recently been shown not to correlate well with human judgements of quality. An alternative evaluation strategy is to compute intrinsic, task-specific properties of the generated output; this requires more domain-specific metrics, but can often produce a better assessment of the output. In this paper, a range of metrics using both of these techniques are used to evaluate three methods for selecting the facial displays of an embodied conversational agent, and the predictions of the metrics are compared with human judgements of the same generated output. The corpus-reproduction metrics show no relationship with the human judgements, while the intrinsic metrics that capture the number and variety of facial displays show a significant correlation with the preferences of the human users.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Timing and Rhythm in Multimodal Communication for Conversational Agents

Synthesis of lifelike gesture is finding growing attention in human-computer interaction. In particular, synchronization of synthetic gestures with speech output is one of the goals for embodied conversational agents which have become a new paradigm for the study of gesture and for human-computer interface (Cassell et al., 2000). Embodied conversational agents are computer-generated characters ...

متن کامل

Enhancing Human-Computer Interaction with Embodied Conversational Agents

We survey recent research in which the impact of an embodied conversational agent on human-computer interaction has been assessed through a human evaluation. In some cases, the evaluation involved comparing different versions of the agent against itself in the context of a full interactive system; in others, it measured the effect on user perception of spoken output of specific aspects of the e...

متن کامل

Embodied Conversational Agent - Based Kiosk for Automated Interviewing

We have created an automated kiosk that uses embodied intelligent agents to interview individuals and detect changes in arousal, behavior, and cognitive effort by using psychophysiological information systems. In this paper, we describe the system and propose a unique class of intelligent agents, which are described as Special Purpose Embodied Conversational Intelligence with Environmental Sens...

متن کامل

Evaluating Dialogs based on Grice’s Maxims

There is no agreed upon standard for the evaluation of conversational dialog systems, which are well-known to be hard to evaluate due to the difficulty in pinning down metrics that will correspond to human judgements and the subjective nature of human judgment itself. We explored the possibility of using Grice’s Maxims to evaluate effective communication in conversation. We collected some syste...

متن کامل

Conversational Agents, Humorous Act Construction, and Social Intelligence

Humans use humour to ease communication problems in human-human interaction and in a similar way humour can be used to solve communication problems that arise with humancomputer interaction. We discuss the role of embodied conversational agents in human-computer interaction and we have observations on the generation of humorous acts and on the appropriateness of displaying them by embodied conv...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008